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ABSTR ACT 

1 paper given at a conference on statistical 
computation discussed teaching statistics with computers. It 
concluded that computer-assisted instruction is most appropriately 
employed in the numerical demonstration of statistical concepts, and 
for statistical laboratory instruction. The student thus learns 
simultaneously about the use of computers and those concepts which 
are best demonstrated through the use of computers— for example, 
multivariate analysis. In an introductory course on statistical 
inference, computers are used for weekly laboratory exercises, 
generating random numbers, empirical theoretical distributions, Monte 
Carlo studies, means, and the like. However, direct use of the 
computer in instruction— namely directions and guest ions included 
on-line — is at this time too expensive. As cost of computer time 
decreases it should become more feasible. Future planning centers 
around more flexible student terminals, and the development of a 
battery of computer-administered tests to further individual 
instruction. (BB) 
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I m pleased that the organizers of this Conference on Statis- 
tical Computation tar fit to include a session on the teaching of 
statistics with computers. Certainly most of the statistical-computer 
effort to date has been directed toward research applications. My 
thesis is that we can and should provide coaputer experience as part 
of instruction in statistical methodology, and that such experiences 
can be designed to facilitate the learning of basic principles of 
statistical inference as well as teach how to use the coaputer in 
the analysis of data. 

The general problem of using the coaputer as an instructional 
device has been under investigation for about 10 years. IVo recent 
surveys of this field, readily available to this audience, are the 
articles in the September, 1968 issue of Datamation by Ziim and others 
and the Atkinson and Wilson (1968) article in Science. Most generally 
called computer-assisted instruction (CAI) , the field has grown from 
* vague idea in 19S8 to a multimillion dollar research enterprise 
in 1969. 

1 Paper prepared for Conference on Statistical Computation, 
University of Wisconsin Computing Center, April 30, 1969. The research 
reported herein was performed pursuant to Contract Honr-624(18) Personnel 
and Training Branch, Psychological Sciences Division, Office of Naval 
Research. Additional aupport was provided by the Office of Education, 

U. S. Department of Health, Education, and Welfare. 
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A variety of different approaches to CAI has energed from all 
this activity. In general they font a spectrum from very rigidly 
controlled student -computer interactions such as drill and practice, 
to systems which allow the student to manipulate and operate on 
aspects of the subject matter through tediniques such as simulation 
and gaming. 

The cost of CAI makes it impossible at this time to justify 
its use for purely instructional purposes. As an object of research 
CAI is a justifiable enterprise on the assumption that computer costs 
will continue to go down (relative to instructional alternatives) 
while its effectiveness will continue to increase, so that someday 
CAI will be cost-effective for at least some kinds of instruction. 
There is some disagreement as to how far away that someday is (see, 
for example, Oettinger and Marks, 1968), but most agree it is coming. 

One situation in which CAI is feasible today is where the 
student must learn how to use the computer anyway, and where such 
learning is a by-product of his computer-assisted instruction in the 
primary subject. Certainly an example of such a subject area is data 
analysis and statistical inference. An example of such an instruc- 
tional system is the one developed at System Development Corporation 
(Rosenbaum, 1968; Rosenbaum, Feingold, Frye and Bennik, 1967). Using 
the PLANIT language, they wrote three types of student exercises: 

1) tutorial -dialogue: a programmed instruction mode with 

computer questions and student answers. 
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2) exposition: primarily Monte Carlo type experiments where 
the student - computer ’’conversations” allow the student to 
specify the kind of experiment he wishes to perform and 
then define the parameters for that experiment. 

3) computational exercises: data analysis experiences with 

contrived or randomly generated data. 

After two years of studying these three CAI modes the authors concluded 
that "CAI is most appropriately employed in the numerical demonstra- 
tion of statistical concepts and for statistical laboratory exercise 
instruction" (Rosenbaum, et. al., 1967, p. 1). 

2 

In the fall of 1967 we began to develop a computer laboratory 
for statistics instruction which took advantage of the availability 
of the University of Pittsburgh's time-sharing system. Today we are 
providing two kinds of experiences in these computer lab sessions. 

Monte Carlo studies are employed in which the student can examine the 
sampling distributions of the statistic he is studying in class and 
note the effects which occur as a result of varying parameters. Hie 
other type of laboratory experience is in data analysis. Here the 
computer takes on the arithmetic chores and frees the student's 
intelligence for considerations such as the selection of appropriate 
variables and samples, choice of the statistical program to be applied, 
and interpretation of the results. 

2 

Colleagues and students who have helped me develop this 
approach are Paul R. Lohnes, Richard Ferguson, James Carlson, Paul 
Stieman, and Anthony Nitko. I am also indebted to Robert Glaser, 
Director of LRDC, for some financial support and personal encouragement. 
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Before examining these laboratory exercises in detail, it would 
be useful to describe the time-sharing computer system on which they 
have been implemented. At the University of Pittsburgh we have the 
IBM System/360, model 50 with 131K main storage (2 micro-second cycle 
time), a million byte large capacity storage (8 micro-second cycle 
time) and the 2314 disc with over two hundred million byte capacity. 
The Pitt Time Sharing System currently supports up to fifty simul- 
taneous users most of whom operate from 2741* s on dedicated lines. 

One feature of the PTS software which we use most heavily in this work 
is the time-sharing editor. The editor proves very useful for the 
initial preparation of source programs and for the continuous creation 
and editing of data for subsequent analyses. The FORTRAN IV compiler 
is available on the system, so with the editor we were able to adopt 
readily our existing statistical FORTRAN batch programs for inter- 
active mode . 

Progr ams and data files are stored on the disc and can be 
loaded or attached with very simple, typed commands. Additional data 
for analysis can be entered from the terminal, from cards taken to the 
Computer Center, or from tapes stored at the Center. When the user 
logs on, he declares how much core he will need for his current work. 
Up to 131K bytes can be allocated if core is available. Most appli- 
cations seem to use 16K or 32K bytes of core. 

* 

Introduction to Statistical Inference 

Our first course in statistical inference serves about 75 to 
100 graduate students in education per trimester. Each student has 

o 
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a weekly laboratory exercise which he does r.t his convenience by using 
one of several 2741 terminals on the campus to which he has access on 
a sign-up basis. The mimeographed directions for each exercise relate 
the lab to the lectures and the text, provide the necessary direction 
for terminal operation, and present questions regarding the computer 
output which the student answers after he has completed his work at 
the terminal. At first we tried to build directions and questions to 
be answered on-line into the computer programs, but we have concluded 
that this is too inefficient of computer time and terminal time. If, 
someday, computer costs come down and the terminal queue is not a 
problem, more tutorial-type interactions can be provided. Meanwhile 
we continue to examine the problem of allocating course content to 
lecture, tests, mimeographed handouts and computer exercises. Let us 
turn now to a description of those exercises. 

The first lab provides experience with simple data manipulations 
such as transformations and descriptive statistics using a dataset 
stored on disc for this purpose. Those data are from a large educa- 
tional survey conducted at the University of Pittsburgh, called Project 
TALENT. This provides the student access to a random sample of Amer- 
ican high school students. He can select variables and subsamples 
(e.g., male or female) as he chooses. 

Then the student moves through a series of computer experiments 
designed to familiarize him with: 

(1) random number generation; 

(2) empirical and theoretical distributions; 

o 
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(3) sample statistics and population parameters; 

(4) Monte Carlo study of sample variances; 

(5) symmetric and nonsymmetric binomial distributions; 

(6) central limit theorem and the normal distribution; 

(7) sampling distribution of the mean; 

(8) the t-distribution, power, type I and II errors; and 

(9) sampling distribution of the correlation coefficient. 

Experience with data analysis is also provided at appropriate 
points in the sequence. Students either enter their own data or use 
Project TALENT data for exercises with chi square, t-test, and corre- 
lation. A current evaluation of this course iggests that the data 
analysis portion should be expanded and some of the initial random 
number demonstrations be shifted to filmed presentations of dice and 
other •Snore concrete" experiments before turning to Monte Carlo exper- 
iments on the computer. 

Printout 1 illustrates a Monte Carlo study of the t-distribution 
and Printout 2 illustrates a correlation analysis, where the student 
centers the data from the terminal. With respect to the computer 
programs that have been developed for this lab, a batch processing 
version of them is available in a new Wiley text (Lohnes and Cooley, 
1968) . 

Introduction to Multivariate Analysis 

The other statistics course in which we have been using the 
time-sharing system is a two-semester sequence in multivariate analysis. 
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Here the emphasis for the computer lab has been on providing data 
analysis experience for students from many divisions of the University 
whose interests are very applied. They want to know how to select, 
compute and interpret multivariate statistics in given research 
situations . 

As each multivariate technique is introduced, the student is 
responsible for conducting a computer analysis of his own, using 
either the Project TALENT dataset stored on disc or appropriate data 
from his own field, if available. Table 1 describes the function of 
each avaiable program and Figure 1 indicates the input/output compati- 
bility which exists in this system. Printout 3 illustrates the first 
page of a small discriminant analysis example. As the student moves 
through an analysis sequence (e.g., EDIT, CORREL, PRINCO, ROTATE), he 
catalogs and stores intermediate output on disc. 

Of course if the objectives of the instruction were more in 
the direction of mathematical statistics than applied, the building 
blocks for such a computer lab could be matrix operations rather than 
specific statistical techniques. However, for the applied course, our 
approach allows the student to focus on concerns such as selection 
and interpretation, which are closer to his needs than would be, say, 
"reinventing" the matrix algebra for canonical correlation every time 
he was interested in exploring the relationships between two sets 



of variables. 
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Plans for the Future 

Following extensive use of the CAI laboratory exercises in 
statistics developed by the project, future efforts will be devoted 
to further increasing the effectiveness of the laboratory, TWo avenues 
will be explored: (a) One is to investigate the use of a nore flex- 

ible student terminal. Monte Carlo experiments will be moved to a 
Sanders CRT terminal in order to see whether they are more effective 
than they have been with a typewriter-terminal approach, (b) The 
other is the development of a battery of computer-a^inistered tests 
which will help to further individualize instruction in statistical 
inference. At the present time, although students work individually 
at a terminal, all students take the sane laboratory exercise in the 
som week and have the same lecture and assignment. The long-range 
inten+ behind the implementation of a computer testing procedure is 
to redesign the course into a type of individually prescribed instruc- 
tion in which the computer does the testing, supplies the laboratory 
experiences, and indicates suggested readings and paper- and-penci 1 
exercises based on the outcomes of the computer-administered tests. 

As I examine systems such as The Augmented Statistician (System 
Development Corporation, 1967) designed to provide the social scientist 
with interactive statistical programs, it seems clear that the instruc- 
tional and interactive production systems are heading toward similar 
goals. So I shall conclude as I began, with an expression of thanks 
to our hosts who have brought us together for this exchange of ideas 
on statistical computation. 
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TABLE 1 



Multivariate Programs on the System 



CANON 


Canonical correlation 


CLASIF 


Multivariate noraal classification 


COEFF 


Factor score coefficients 


CORREL 


Correlation 


COVAR 


Covariance analysis 


D1SCRM 


Multiple group discriminant analysis 


FACDIS 


Factorial discriminant analysis 


FACTOR 


Extraction of arbitrary factorial analysis 


FSCORE 


Factor scores 


MANOVA 


Multivariate analysis of variance 


MULTR 


Multiple correlation 


PARTL 


Multiple partial correlation 


PRINCO 


Principal components 


ROTATE 


Varimax or quart imax rotation 



These programs were adopted from Cooley and Lohnes (1962). 
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POSSIBLE PROGRAM SEQUENCES 



>$$ logon 168wwc, slze*32000. 
Ms ENTER PASSWORD 



Printout 1 
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ACCEPTED 

Mj See itexplnin schedule about March 13. 
Ready: 

>$$size * 32000. 

>$$1oad d met. 

MONTE CARLO ON T TEST 



TYPE A 3 DIGIT NUMBER (200 OR SMALLER) GIVING THE NUMBER OF SAMPLE PAIRS 
TO BE DRAWN 
>200 

TYPE A 2 DIGIT NUMBER (10 OR SMALLER) GIVING THE SIZE OF EACH SAMPLE 
>08 

BOTH POPULATIONS SAMPLED HAVE UNIT VARIANCE BUT MEANS MAY BE MADE TO DIFFEI 
TYPE 4 CHARACTERS (WITH DECIMAL) BETWEEN -2.0 AND +2.0, INDICATING DESIRED 
DIFFERENCE 
> 0.0 

TYPE IN ANY EIGHT DIGIT "RANDOM" NUMBER TO START THE RANDOM GENERATOR. 
>68940213 

******* DISTRIBUTION OF T*S ******* 



THE MEAN * 0.0771 



THE STANDARD DEVIATION * 1.0330 



THE VARIANCE * 1.0671 



FREQUENCY AND CUMULATIVE FREQUENCY DISTRIBUTION 
INTERVAL LOWER LIMIT FREQUENCY CUM. FREQ. 







1 


-99.000 


2 


-3.333 


3 


-3.000 


4 


-2.667 


5 


-2.333 


6 


-2.000 


7 


-1.667 


8 


-1.333 


9 


-1.000 


10 


-0.667 


11 


-0.333 


12 


0.000 


13 


0.333 


14 


0.667 


15 


1.000 


16 


1.333 


17 


1.667 


18 


2.000 


19 


2.333 


20 


2.667 


21 


3.000 


22 


3.333 



0 


0 


0 


0 


0 


0 


2 


2 


5 


7 


6 


13 


7 


20 


5 


25 


17 


42 


24 


66 


32 


98 


16 


114 


33 


147 


17 


164 


12 


176 


16 


192 


4 


196 


1 


197 


0 


197 


2 


199 


0 


199 


1 


200 



TRY RUNNING THIS PROGRAM AGAIN WHEN THE NtJI I HVPOTHFSIS l<! f»iff 




>$$det 


all. 


>$$P 




>$$llst 


mydata 


14. 


12. 


• 


16. 


11. 


15. 


07. 


11. 


06. 


08. 


05. 


10. 


08. 


16. 


03. 


09. 


09. 


13. 


>$$det 


all. 



>$$att d mydata as F8. 
>$$1oad d studat. 

LOADING STARTS AT LOG 060200 
EXTERNAL SYMBOL TABLE 



Printout 2 



IS 



CORRELATION ANALYSIS OF STUDENTS DATA 



SUPPLY THE NUMBER OF SUBJECTS ON THE DATASET YOU HAVE ATTACHED 
AS A 3-DIGIT INTEGER. 

>009 



SUPPLY THE NUMBER OF VARIABLES CONTAINED ON THE DATASET YOU HAVE 
ATTACHED AS A 1-DIGIT INTEGER BETWEEN 2 AND 8. 

>2 



CORRELATION ANALYSIS BETWEEN VARIABLES 1 AND 2 FOR 9 SUBJECTS. 



VARIABLE 1 


• 


VARIABLE 


2 




MEAN 


8.333 


MEAN 


m 


12.222 


VARIANCE » 


12.500 


VARIANCE 


9 


8.944 


ST DEV « 3. 5355 

CORRELATION COEFFICIENT 


ST DEV 
P = 0.654 


• 


2.9907 



R-SOUARED * 0.4279 

Z-SCORE STANDARD ERROR ESTIMATE * 0.7564 

T CALCULATED FROM ABOVE R: 

T * 2.288 WITH NDF = 7 
WOULD YOU LIKE TO TRY THIS PROGRAM AGAIN? 

>no 

WHEN YOU HAVE COMPLETED YOUR WORK AT THE TERMINAL, BE SURE TO TYPE $$LOGOFF 
"SEE YOU NEXT WEEK!** 
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>$$att d tandvv as F7. 

>$$att d means as F8. 

>$$att disk as F9. Printout 3 14 

>$$1oad d main. 

LOADING STARTS AT LOC 088200 

EXTERNAL SYMBOL TABLE 
MAIN 0 

MULTIPLE DISCRIMINANT ANALYS IS, COMPILED 21 JAN 69 



SUPPLY THE NUMBER OF VARIABLES AS A TWO DIGIT INTEGER NOT GREATER THAN 20. 
> 2 



SUPPLY THE NUMBER OF GROUPS AS A TWO DIGIT INTEGER NOT GREATER THAN 20. 
> 3 



SUPPLY THE NUMBER OF SUBJECTS AS A 4 DIGIT INTEGER. 
> 196 



SUPPLY THE NUMBER OF CONTROL VARIABLES PREVIOUSLY PARTI ALED OUT BY 
COVAR AS A TWO DIGIT INTEGER. 

>00 



F-RATIO FOR H2, OVERALL D I SCRIMI NATION / = 2.15 

NDF1 * 4 AND NDF2 = 384 

CHI-SQUARE TESTS WITH SUCCESSIVE ROOTS REMOVED 



ROOTS 


CANONICAL R 




CHI 






PERCENT 


REMOVED 


R SQUARED 


EJGENVALUE 


SQUARE 


NOF 


LAMBDA 


TRACE 


0 


0.208 0.043 


0.045 


8.51 


6 


0.96 


99.87 


1 


0.008 0.000 


0.000 


0.01 


2 


1.00 


0.13 


ROW COEFFICIENTS VECTORS 


• 










D F 1 


0.0043032 


0.0494752 










D F 2 


-0.0557285 


0.0978380 











FACTOR PATTERN FOR DISCRIMINANT FUNCTIONS 



TEST 



1 0.888 -0.449 

2 0.992 0.077 



COMMUNAL I TIES FOR 
1 0.990 



DISCRIMINANT FACTORS 
0.990 



PERCENTAGE OF TRACE OF R ACCOUNTED FOR BY EACH ROOT 



1.88.611 2 10.372 
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